home *** CD-ROM | disk | FTP | other *** search
-
- A.OUT(5) UNIX Programmer's Manual A.OUT(5)
-
- NNAAMMEE
- aa..oouutt - format of executable binary files
-
- SSYYNNOOPPSSIISS
- ##iinncclluuddee <<aa..oouutt..hh>>
-
- DDEESSCCRRIIPPTTIIOONN
- The include file <_a_._o_u_t_._h> declares three structures and several macros.
- The structures describe the format of executable machine code files
- (`binaries') on the system.
-
- A binary file consists of up to 7 sections. In order, these sections
- are:
-
- exec header Contains parameters used by the kernel to load a binary
- file into memory and execute it, and by the link editor
- ld(1) to combine a binary file with other binary files.
- This section is the only mandatory one.
-
- text segment Contains machine code and related data that are loaded
- into memory when a program executes. May be loaded
- read-only.
-
- data segment Contains initialized data; always loaded into writable
- memory.
-
- text relocations Contains records used by the link editor to update
- pointers in the text segment when combining binary
- files.
-
- data relocations Like the text relocation section, but for data segment
- pointers.
-
- symbol table Contains records used by the link editor to cross ref-
- erence the addresses of named variables and functions
- (`symbols') between binary files.
-
- string table Contains the character strings corresponding to the
- symbol names.
-
- Every binary file begins with an _e_x_e_c structure:
-
- struct exec {
- unsigned long a_midmag;
- unsigned long a_text;
- unsigned long a_data;
- unsigned long a_bss;
- unsigned long a_syms;
- unsigned long a_entry;
- unsigned long a_trsize;
- unsigned long a_drsize;
- };
-
- The fields have the following functions:
-
- _a___m_i_d_m_a_g This field is stored in network byte-order so that binaries for
- for machines with alternate byte orders can be distinguished.
- It has a number of sub-components accessed by the macros
- N_GETFLAG(), N_GETMID(), and N_GETMAGIC(), and set by the macro
- N_SETMAGIC().
-
-
-
- The macro N_GETFLAG() returns a few flags:
-
- EX_DYNAMIC indicates that the executable requires the services
- of the run-time link editor.
-
- EX_PIC indicates that the object contains position inde-
- pendent code. This flag is set by as(1) when given
- the `-k' flag and is preserved by ld(1) if neces-
- sary.
-
- If both EX_DYNAMIC and EX_PIC are set, the object file is a po-
- sition indendent executable image (eg. a shared library), which
- is to be loaded into the process address space by the run-time
- link editor.
-
- The macro N_GETMID() returns the machine-id. This indicates
- which machine(s) the binary is intended to run on.
-
- N_GETMAGIC() specifies the magic number, which uniquely identi-
- fies binary files and distinguishes different loading conven-
- tions. The field must contain one of the following values:
-
- OMAGIC The text and data segments immediately follow the head-
- er and are contiguous. The kernel loads both text and
- data segments into writable memory.
-
- NMAGIC As with OMAGIC, text and data segments immediately fol-
- low the header and are contiguous. However, the kernel
- loads the text into read-only memory and loads the data
- into writable memory at the next page boundary after
- the text.
-
- ZMAGIC The kernel loads individual pages on demand from the
- binary. The header, text segment and data segment are
- all padded by the link editor to a multiple of the page
- size. Pages that the kernel loads from the text seg-
- ment are read-only, while pages from the data segment
- are writable.
-
- _a___t_e_x_t Contains the size of the text segment in bytes.
-
- _a___d_a_t_a Contains the size of the data segment in bytes.
-
- _a___b_s_s Contains the number of bytes in the `bss segment' and is used
- by the kernel to set the initial break (brk(2)) after the data
- segment. The kernel loads the program so that this amount of
- writable memory appears to follow the data segment and initial-
- ly reads as zeroes.
-
- _a___s_y_m_s Contains the size in bytes of the symbol table section.
-
- _a___e_n_t_r_y Contains the address in memory of the entry point of the pro-
- gram after the kernel has loaded it; the kernel starts the exe-
- cution of the program from the machine instruction at this ad-
- dress.
-
- _a___t_r_s_i_z_e Contains the size in bytes of the text relocation table.
-
- _a___d_r_s_i_z_e Contains the size in bytes of the data relocation table.
-
- The _a_._o_u_t_._h include file defines several macros which use an _e_x_e_c struc-
- ture to test consistency or to locate section offsets in the binary file.
-
- NN__BBAADDMMAAGG(_e_x_e_c) Nonzero if the _a___m_a_g_i_c field does not contain a recog-
-
- nized value.
-
- NN__TTXXTTOOFFFF(_e_x_e_c) The byte offset in the binary file of the beginning of
- the text segment.
-
- NN__SSYYMMOOFFFF(_e_x_e_c) The byte offset of the beginning of the symbol table.
-
- NN__SSTTRROOFFFF(_e_x_e_c) The byte offset of the beginning of the string table.
-
- Relocation records have a standard format which is described by the
- _r_e_l_o_c_a_t_i_o_n___i_n_f_o structure:
-
- struct relocation_info {
- int r_address;
- unsigned int r_symbolnum : 24,
- r_pcrel : 1,
- r_length : 2,
- r_extern : 1,
- r_baserel : 1,
- r_jmptable : 1,
- r_relative : 1,
- r_copy : 1;
- };
-
- The _r_e_l_o_c_a_t_i_o_n___i_n_f_o fields are used as follows:
-
- _r___a_d_d_r_e_s_s Contains the byte offset of a pointer that needs to be link-
- edited. Text relocation offsets are reckoned from the start
- of the text segment, and data relocation offsets from the
- start of the data segment. The link editor adds the value
- that is already stored at this offset into the new value
- that it computes using this relocation record.
-
- _r___s_y_m_b_o_l_n_u_m Contains the ordinal number of a symbol structure in the
- symbol table (it is _n_o_t a byte offset). After the link edi-
- tor resolves the absolute address for this symbol, it adds
- that address to the pointer that is undergoing relocation.
- (If the _r___e_x_t_e_r_n bit is clear, the situation is different;
- see below.)
-
- _r___p_c_r_e_l If this is set, the link editor assumes that it is updating
- a pointer that is part of a machine code instruction using
- pc-relative addressing. The address of the relocated point-
- er is implicitly added to its value when the running program
- uses it.
-
- _r___l_e_n_g_t_h Contains the log base 2 of the length of the pointer in
- bytes; 0 for 1-byte displacements, 1 for 2-byte displace-
- ments, 2 for 4-byte displacements.
-
- _r___e_x_t_e_r_n Set if this relocation requires an external reference; the
- link editor must use a symbol address to update the pointer.
- When the _r___e_x_t_e_r_n bit is clear, the relocation is `local';
- the link editor updates the pointer to reflect changes in
- the load addresses of the various segments, rather than
- changes in the value of a symbol (except when _r___b_a_s_e_r_e_l is
- also set (see below). In this case, the content of the
- _r___s_y_m_b_o_l_n_u_m field is an _n___t_y_p_e value (see below); this type
- field tells the link editor what segment the relocated
- pointer points into.
-
- _r___b_a_s_e_r_e_l If set, the symbol, as identified by the _r___s_y_m_b_o_l_n_u_m field,
- is to be relocated to an offset into the Global Offset
- Table. At run-time, the entry in the Global Offset Table at
-
-
- this offset is set to be the address of the symbol.
-
- _r___j_m_p_t_a_b_l_e If set, the symbol, as identified by the _r___s_y_m_b_o_l_n_u_m field,
- is to be relocated to an offset into the Procedure Linkage
- Table.
-
- _r___r_e_l_a_t_i_v_e If set, this relocation is relative to the (run-time) load
- address of the image this object file is going to be a part
- of. This type of relocation only occurs in shared objects.
-
- _r___c_o_p_y If set, this relocation record identifies a symbol whose
- contents should be copied to the location given in
- _r___a_d_d_r_e_s_s_. The copying is done by the run-time link-editor
- from a suitable data item in a shared object.
-
- Symbols map names to addresses (or more generally, strings to values).
- Since the link-editor adjusts addresses, a symbol's name must be used to
- stand for its address until an absolute value has been assigned. Symbols
- consist of a fixed-length record in the symbol table and a variable-
- length name in the string table. The symbol table is an array of _n_l_i_s_t
- structures:
-
- struct nlist {
- union {
- char *n_name;
- long n_strx;
- } n_un;
- unsigned char n_type;
- char n_other;
- short n_desc;
- unsigned long n_value;
- };
-
- The fields are used as follows:
-
- _n___u_n_._n___s_t_r_x Contains a byte offset into the string table for the name of
- this symbol. When a program accesses a symbol table with
- the nlist(3) function, this field is replaced with the
- _n___u_n_._n___n_a_m_e field, which is a pointer to the string in memo-
- ry.
-
- _n___t_y_p_e Used by the link editor to determine how to update the sym-
- bol's value. The _n___t_y_p_e field is broken down into three
- sub-fields using bitmasks. The link editor treats symbols
- with the N_EXT type bit set as `external' symbols and per-
- mits references to them from other binary files. The N_TYPE
- mask selects bits of interest to the link editor:
-
- N_UNDF An undefined symbol. The link editor must locate an
- external symbol with the same name in another binary
- file to determine the absolute value of this symbol.
- As a special case, if the _n___v_a_l_u_e field is nonzero
- and no binary file in the link-edit defines this
- symbol, the link-editor will resolve this symbol to
- an address in the bss segment, reserving an amount
- of bytes equal to _n___v_a_l_u_e. If this symbol is unde-
- fined in more than one binary file and the binary
- files do not agree on the size, the link editor
- chooses the greatest size found across all binaries.
-
- N_ABS An absolute symbol. The link editor does not update
- an absolute symbol.
-
- N_TEXT A text symbol. This symbol's value is a text ad-
- dress and the link editor will update it when it
-
- merges binary files.
-
- N_DATA A data symbol; similar to N_TEXT but for data ad-
- dresses. The values for text and data symbols are
- not file offsets but addresses; to recover the file
- offsets, it is necessary to identify the loaded ad-
- dress of the beginning of the corresponding section
- and subtract it, then add the offset of the section.
-
- N_BSS A bss symbol; like text or data symbols but has no
- corresponding offset in the binary file.
-
- N_FN A filename symbol. The link editor inserts this
- symbol before the other symbols from a binary file
- when merging binary files. The name of the symbol
- is the filename given to the link editor, and its
- value is the first text address from that binary
- file. Filename symbols are not needed for link-
- editing or loading, but are useful for debuggers.
-
- The N_STAB mask selects bits of interest to symbolic debug-
- gers such as gdb(1); the values are described in stab(5).
-
- _n___o_t_h_e_r This field provides information on the nature of the symbol
- independent of the symbol's location in terms of segments as
- determined by the _n___t_y_p_e field. Currently, the lower 4 bits
- of the _n___o_t_h_e_r field hold one of two values: AUX_FUNC and
- AUX_OBJECT (see <_l_i_n_k_._h> for their definitions). AUX_FUNC
- associates the symbol with a callable function, while
- AUX_OBJECT associates the symbol with data, irrespective of
- their locations in either the text or the data segment.
- This field is intended to be used by ld(1) for the construc-
- tion of dynamic executables.
-
- _n___d_e_s_c Reserved for use by debuggers; passed untouched by the link
- editor. Different debuggers use this field for different
- purposes.
-
- _n___v_a_l_u_e Contains the value of the symbol. For text, data and bss
- symbols, this is an address; for other symbols (such as de-
- bugger symbols), the value may be arbitrary.
-
- The string table consists of an _u_n_s_i_g_n_e_d _l_o_n_g length followed by null-
- terminated symbol strings. The length represents the size of the entire
- table in bytes, so its minimum value (or the offset of the first string)
- is always 4 on 32-bit machines.
-
- SSEEEE AALLSSOO
- as(1), gdb(1), ld(1), brk(2), execve(2), nlist(3), core(5),
- dbx(5), stab(5), link(5)
-
- HHIISSTTOORRYY
- The _a_._o_u_t_._h include file appeared in Version 7 AT&T UNIX.
-
- BBUUGGSS
- Nobody seems to agree on what _b_s_s stands for.
-
- New binary file formats may be supported in the future, and they probably
- will not be compatible at any level with this ancient format.
-
- BSD Experimental June 5, 1993 5
-